Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype references #6433

Draft
wants to merge 100 commits into
base: main
Choose a base branch
from
Draft

Prototype references #6433

wants to merge 100 commits into from

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Aug 17, 2022

I don't want to merge this PR. This more like a feature branch that we can discuss on. For the actual port we can either cleanup this PR or use it as a starting point for another.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Just a few comments. Feel free to ignore if it's too early.

references/classification/train.py Outdated Show resolved Hide resolved
if mixup_or_cutmix:
batch_transform = transforms.Compose(
[
WrapIntoFeatures(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have to WrapIntoFeatures unconditionally from whether we use mixup/cutmix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that default_collate does not respect tensor subclasses. Since we use this transform afterwards, we need to wrap here. Of course we can also wrap before, but it is not necessary since the input is a plain tensor that defaults to images and an integer which is completely ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is, what if I don't use mixup or cutmix? Shouldn't we wrap the data into features.Image anyway? I might be missing something here. My main point is that since we are testing the new API, we should probably wrap all inputs using their appropriate types and see how the new kernels behave (Rather than relying on their default/legacy pure tensor implementations).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only wrap after we have converted from PIL. This happens fairly late in the transform:

transforms.PILToTensor(),

I remember @vfdev-5 noting that on the CPU PIL kernels are faster (I don't remember if there was a special case or other constraints; please fill the blanks). Thus, if we want to optimize for speed, we should probably leave it as is. No strong opinion though.

torchvision/prototype/transforms/__init__.py Outdated Show resolved Hide resolved
@pmeier pmeier reopened this Aug 18, 2022
@pmeier pmeier changed the title Prototype references/classification Prototype references Aug 24, 2022
@pmeier
Copy link
Collaborator Author

pmeier commented Aug 24, 2022

I've added the needed changes for the detection references.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmeier There seem to be a few issues with the scripts. See below. Might be worth doing a dummy run with very little data to confirm they work.

references/classification/train.py Outdated Show resolved Hide resolved
references/detection/coco_utils.py Show resolved Hide resolved
@pmeier
Copy link
Collaborator Author

pmeier commented Sep 1, 2022

I've run the references for a few iterations with the following parameters to confirm they work:

  • Classification:

    [
        "--device=cpu",
        "--batch-size=2",
        "--epochs=1",
        "--workers=2",
        "--mixup-alpha=0.5",
        "--cutmix-alpha=0.5",
        "--auto-augment=ra",  # "ra", "ta_wide", "augmix", "imagenet", "cifar10", "svhn"
        "--random-erase=1.0",
    ]
  • Detection:

    [
        "--device=cpu",
        "--batch-size=2",
        "--epochs=1",
        "--workers=2",
        "--data-augmentation=hflip",  # "hflip", "lsj", "multiscale", "ssd", "ssdlite"
        # "--use-copypaste",  # if data_augmention == "lsj"
    ]

@datumbox
Copy link
Contributor

datumbox commented Nov 4, 2022

Tensor Backend + antialias=True

Reverifying the new API after the speed optimizations. Reference runs at #6433 (comment) and #6433 (comment)

Classification

Augmentation: ta_wide + random erasing + mixup + cutmix

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model resnet50 --batch-size 128 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --train-crop-size 176 --model-ema --val-resize-size 232 --ra-sampler --ra-reps 4 --data-path /datasets01_ontap/imagenet_full_size/061417/
# V2 Target Acc: 80.626 / 95.310 - time: 2 days, 6:04:18 - jobid: experiments/PR6433/68029
Submitted job_id: 75703
Test: EMA Acc@1 80.862 Acc@5 95.476
Training time 2 days, 0:12:39

Result: Similar accuracy, 11% faster than unoptimized V2.

Augmentation: aa + random erasing

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --model mobilenet_v3_small --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2 --data-path /datasets01_ontap/imagenet_full_size/061417/
# V2 Target Acc: 66.044 / 86.338 - time: 2 days, 4:55:57 - jobid: experiments/PR6433/68030
Submitted job_id: 75704
Test:  Acc@1 67.146 Acc@5 87.086
Training time 1 day, 22:05:18

Result: Similar accuracy (improvement not statistically significant), 13% faster than unoptimized V2.

Detection

Augmentation: multiscale

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone ResNet50_Weights.IMAGENET1K_V2 --dataset coco --model retinanet_resnet50_fpn_v2 --opt adamw --lr 0.0001 --epochs 26 --lr-steps 16 22 --weight-decay 0.05 --norm-weight-decay 0.0 --data-augmentation multiscale --sync-bn --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.415 - time: 9:49:04 - jobid: experiments/PR6433/67794
Submitted job_id: 75705
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.413
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.615
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.437
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.273
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.456
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.536
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.337
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.544
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.585
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.434
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.625
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.719
Training time 9:46:33

Result: Similar accuracy and speed.

Augmentation: ssdlite

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco --model ssdlite320_mobilenet_v3_large --aspect-ratio-group-factor 3 --epochs 660 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24 --weight-decay 0.00004 --data-augmentation ssdlite --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.212 - time: 1 day, 16:06:26 - jobid: experiments/PR6433/67795
Submitted job_id: 75706
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.210
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.341
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.218
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.009
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.198
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.434
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.207
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.304
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.330
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.041
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.338
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.644
Training time 1 day, 14:44:39

Result: Similar accuracy, 8% faster than unoptimized V2.

Augmentation: ssd

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --weights-backbone VGG16_Weights.IMAGENET1K_FEATURES --dataset coco --model ssd300_vgg16 --epochs 120 --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4 --weight-decay 0.0005 --trainable-backbone-layers 5 --data-augmentation ssd --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.252 - time: 17:28:42 - jobid: experiments/PR6433/67796
Submitted job_id: 75707
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.254
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.421
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.264
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.056
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.272
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.437
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.238
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.367
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.091
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.409
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.599
Training time 16:43:37

Result: Similar accuracy, 4% faster than unoptimized V2.

Augmentation: lsj + copypaste

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 4 --dataset coco --model maskrcnn_resnet50_fpn_v2 --epochs 600 --lr-steps 540 570 585 --lr 0.32 --batch-size 8 --weight-decay 0.00004 --sync-bn --data-augmentation lsj --use-copypaste --data-path /datasets01_ontap/COCO/022719/
# V2 Target Acc: 0.474 / 0.416 - time: 3 days, 15:09:55 - jobid: experiments/PR6433/67791
Submitted job_id: 75998
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.480
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.682
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.526
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.318
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.516
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.626
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.371
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.592
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.621
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.447
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.657
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.419
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.655
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.452
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.231
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.447
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.609
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.336
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.550
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.371
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.589
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.709
Training time 3 days, 14:07:56

Result: Similar accuracy and speed.

Segmentation

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 1 --dataset coco -b 4 --model lraspp_mobilenet_v3_large --lr 0.01 --wd 0.000001 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1
# V2 Target Acc: 90.5 / 54.7 - time: 2:20:45 - jobid: experiments/PR6433/67797
Submitted job_id: 75997
global correct: 90.7
average row correct: ['94.8', '72.7', '63.3', '79.2', '43.5', '28.8', '84.4', '60.7', '84.2', '28.4', '90.5', '52.2', '80.7', '74.2', '85.6', '88.2', '33.3', '67.6', '55.9', '77.9', '59.0']
IoU: ['89.6', '52.9', '59.2', '69.2', '38.5', '24.2', '79.2', '52.2', '71.6', '23.2', '56.8', '35.8', '62.3', '56.3', '72.0', '78.3', '23.1', '57.7', '37.3', '54.6', '54.7']
mean IoU: 54.7
Training time 2:20:49 

Result: Similar accuracy and speed.

Video

New Recipe

Using githash 959af2d:

PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --ngpus 8 --nodes 8 --cache-dataset --batch-size=12 --lr 0.2 --clip-len 64 --clips-per-video 5 --sync-bn --model s3d --auto-augment ta_wide --mixup-alpha 0.8 --cutmix-alpha 1.0 --random-erase 0.25 --train-resize-size 256 320 --train-crop-size 224 224 --val-resize-size 256 256 --val-crop-size 224 224 --data-path="/datasets01_ontap_isolated/kinetics/070618/400/"
# V2 Target Acc: 70.903 / 90.434 - time: 3 days, 6:00:05 - jobid: experiments/PR6433/72508
Submitted job_id: 75701
Training time 3 days, 3:42:15
trainrun torchrun --nproc_per_node=8 train.py --data-path="/datasets01_ontap/kinetics/070618/400/" --batch-size=16 --test-only --cache-dataset --clip-len 128 --clips-per-video 1 --model s3d --val-resize-size 256 256 --val-crop-size 224 224 --resume experiments/PR6433/75701/model_40.pth
 * Clip Acc@1 71.134 Clip Acc@5 90.486

Result: Similar accuracy, the speed is faster. It's a bit harder to estimate improvement because the logs indicate IO slowdown caused by OnTap during at least 3 epochs. The new version is consistently 6-7 minutes per epoch faster than the old, which translates to roughly 7-8% improvement.

@NicolasHug NicolasHug mentioned this pull request Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants